Weblog Clustering in Multilinear Algebra Perspective
نویسنده
چکیده
This paper describes a clustering method to group the most similar and important weblogs with their descriptive shared words by using a technique from multilinear algebra known as PARAFAC tensor decomposition. The proposed method first creates labeled-link network representation of the weblog datasets, where the nodes are the blogs and the labels are the shared words. Then, 3-way adjacency tensor is extracted from the network and the PARAFAC decomposition is applied to the tensor to get pairs of node lists and label lists with scores attached to each list as the indication of the degree of importance. The clustering is done by sorting the lists in decreasing order and taking the pairs of top ranked blogs and words. Thus, unlike standard co-clustering methods, this method not only groups the similar blogs with their descriptive words but also tends to produce clusters of important blogs and descriptive words.
منابع مشابه
Multilinear Complexity is Equivalent to Optimal Tester Size
In this paper we first show that Tester for an F-algebra A and multilinear forms, [2], is equivalent to multilinear algorithm for the product of elements in A, [3]. Our result is constructive in deterministic polynomial time. We show that given a tester of size ν for an F-algebra A and multilinear forms of degree d one can in deterministic polynomial time construct a multilinear algorithm for t...
متن کاملA Fuzzy Grassroots Ontology for improving Weblog Extraction
This paper presents fuzzy clustering algorithms to establish a grassroots ontology – a machine-generated weak ontology – based on folksonomies. Furthermore, it describes a search engine for vaguely associated terms and aggregates them into several meaningful cluster categories, based on the introduced weak grassroots ontology. A potential application of this ontology, weblog extraction, is illu...
متن کاملIdentifying Network Anomalies Using Clustering Technique in Weblog Data
In this paper we present an approach for identifying network anomalies by visualizing network flow data which is stored in weblogs. Various clustering techniques can be used to identify different anomalies in the network. Here, we present a new approach based on simple K-Means for analyzing network flow data using different attributes like IP address, Protocol, Port number etc. to detect anomal...
متن کاملWeblog success: Exploring the role of technology
Weblogs have recently gained considerable media attention. Leading weblog sites are already attracting millions of visitors. Yet, success in the highly competitive world of weblogs is not easily achieved. This study seeks to explore weblog success from a technology perspective, i.e. from the impact of weblog-building technology (or blogging tool). Based on an examination of 126 highly successfu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0909.2345 شماره
صفحات -
تاریخ انتشار 2009